Published Sep 23, 2021 by Alex C-G
I’ve been working with Jina Flows and Executors a lot recently. I’ve really been enjoying it, and now I’m starting to really grok why.
Coding in Jina is very much like using a bash shell. (Or zsh. Or korn. Or take your pick)
In the shell you work with:
sed
, grep
, head
, rm
|
, >
, &>
apt
, yum
, pamac
You use a package manager to install a command (if it isn’t already built-in). Then you can run commands and chain them together. For example, let’s say you installed Jina and want to add it to your requirements.txt
:
yay -S ripgrep # yay is apt for Arch Linux
pip freeze | rg "jina" >> requirements.txt
I’m using ripgrep
/rg
instead of grep
to illustrate the package management bit. (It’s like grep
but way faster.)
Or something I do quite often: Run a Jina search, grab the resulting JSON, format it nicely, and yank it onto my clipboard (yy
is a simple alias I’ll put at the end of this post.)
curl --request POST -d '{"top_k":100,"mode":"search","data":["aliens and monsters"]}' -H 'Content-Type: application/json' 'http://0.0.0.0:45678/search' | jq | yy
These concepts are reflected in Jina’s design pattern:
grep
is a tool that does one thing and does it well, so too are TransformerTorchEncoder
, CLIPImageEncoder
, SimpleIndexer
, etc.
.add()
: Flow().add(CLIPImageEncoder).add(SimpleIndexer)
.
TransformerTorchEncoder
yourself, you simply download (or pull) it: Flow().add("jinahub+docker://CLIPImageEncoder")
.
Let’s say I’m a hopeless romantic (you know I am). If I were going to index every one of Shakespeare’s lines that said “love” I’d do it like this in bash:
I mean I have them all memorized, but y’know
for filename in /shakespeare/*.txt; do
cat filename | grep "love" >> love_lines.txt # (Yes, I could use grep directly but I wanna show off piping mom)
done
Christ, I hate coding in bash.
And (more or less) like this in Jina:
# Create a doc for each of Shakey baby's works
docs = DocumentArray(from_files("shakespeare/*.txt"))
# Create simple Flow
flow = Flow()
.add(uses="jinahub+docker://Sentencizer") # break down into sentences
.add(uses="jinahub+docker://TransformerTorchEncoder") # encode into vectors
.add(uses="jinahub+docker://SimpleIndexer") # build index
# Index the Documents
flow.index(input=docs)
# Create a query Document
query_doc = Document(text="love")
# Run the search Flow and store matches
matches = flow.search(return_results=True)
# See the matches
print(matches)
I skipped the imports and maybe a few bits of config you may want to add, but you get the idea:
Sentencizer
, TransformerTorchEncoder
, and SimpleIndexer
are the commands. They do one thing and do it (hopefully) well.
jinahub+docker://
we install them with the “package manager”.
.add
is just like using |
to pass the output of shell commands from one to another.
grep
would miss any mention of loving
, romance
, heart
, etc. TransformerTorchEncoder
would see the connection.
yy
and pp
Okay, I promised I’d explain yy
. If you use Vim, you may get the reference. I use these aliases to shuttle data between my clipboard and command line:
ls | yy
- pipes the output of ls
to the clipboard.
git clone `pp`
- clones a git repo with the URL I just copied from my browser. (Those backticks pass the output of pp
directly to the command via shell magic)
Here are the aliases, along with gcpp
which I use a lot:
alias pp='xclip -o -selection clipboard'
alias yy='xclip -selection clipboard'
alias gcpp='git clone `xclip -o -selection clipboard`'